A Dataset for Pull Request Research
نویسندگان
چکیده
Pull requests form a new method for collaborating in distributed software development. To study the pull request distributed development model, we constructed a dataset of almost 900 projects and 350,000 pull requests, including some of the largest users of pull requests on Github. In this paper, we describe how the project selection was done, we analyze the selected features and present a machine learning tool set for the R statistics environment.
منابع مشابه
Rumor Spreading with Bounded In-Degree
We consider a variant of the well-studied gossip-based model of communication for disseminating information in a network. Classically, in each time unit, every node u is allowed to contact a single random neighbor v. If u knows the data (rumor) to be disseminated, node v learns it (known as push) and if node v knows the rumor, u learns it (known as pull). While in the classic gossip model, each...
متن کاملPerformance Guarantee in a New Hybrid Push-Pull Scheduling Algorithm
The rapid growth of web services has already given birth to a set of data dissemination applications. Efficient scheduling techniques are necessary to endow such applications with advanced data processing capability. In this paper we have effectively combined broadcasting of very popular (push) data and dissemination of less popular (pull) data to develop a new hybrid scheduling scheme. The sep...
متن کاملتجربهی خانوادهی بیماران مرگ مغزی کاندید درخواست اهدای عضو: یک مطالعه کیفی
The aim of this study was to explore experiences of family members of patients confronting brain death diagnosis and the request for organ donation.A qualitative study was designed focusing on content analysis. Data collection process included 38 unstructured in- depth interviews with relatives of 26 brain death patients who were candidate for organ donation and field notes. Sampling method beg...
متن کاملReviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment?
Context: The pull-based model, widely used in distributed software development, offers an extremely low barrier to entry for potential contributors (anyone can submit of contributions to any project, through pull-requests). Meanwhile, the project’s core team must act as guardians of code quality, ensuring that pull-requests are carefully inspected before being merged into the main development l...
متن کاملGlobal Status and Trends in Intellectual Property Claims: Patent Dataset for Biodiversity
This patent dataset is made available by the authors to encourage further research and methodological development. In making the dataset available in an open access journal our aim is to encourage greater research and data sharing on intellectual property and biodiversity. On that basis the sole condition of use is attribution of authorship. Excel files are available from the authors upon request.
متن کامل